Strong Scaling Study
Adam Lyon, Muon g-2 IRMA Analysis, Fermilab, October 2020
This notebook examines strong scaling properties of Julia IRMA jobs making just one plot. I ran jobs with 2,3,4,5,6,7,8,9,10,12,15 and 20 nodes and always with 32 tasks per node. If I examine the timings of a part of the Julia run that includes opening and reading the HDF5 input file, creating the histogram, and running MPI.Reduce and MPI.Gather, I see expected strong scaling. The jobs get faster with more nodes as each task has less of the file to read. If I examine the total elapsed time reported by the batch system, then the scaling is less clear. It almost appears that more nodes make Julia run more slowly. Speculation suggests that perhaps there is contention for loading packages. Next steps could be to try PackageCompiler.jl to make a Julia "app" with fast startup time.
This notebook answers issue Analyze results from Strong Scaling jobs #18 and code may be found in PR #20. This file is IRMA/analyses/018_StrongScaling/StrongScaling.jl.
What is this notebook?
This is a Pluto.jl notebook and the code here is written in Julia. This is like a Jupyter notebook, but with important differences. The most important difference is that the results appear above the code. Another important difference is that Pluto.jl notebooks are reactive. This means that unlike Jupyter notebooks, Pluto.jl notebooks are always in a consistent state. The notebook keeps track of the cell-to-cell dependencies and when a cell changes, the dependent cells update at the same time. This means that while you are looking at a static html representation of the notebook, you can be assured that the notebook is consistent and up-to-date. You'll see that some results have a little triangle next to them. Clicking on that will open an expanded view of the results.
The organization of this notebook is that the main results are replicated at the top, with discussion, in the Results section. The plots are stored in variables which you can see below the plot. You can look in the Code section, which has all code for this notebook, to see how the plot was made.
Introduction
This notebook examines strong scaling properties of my Julia IRMA jobs that make one plot.
On 10/22, I ran IRMA/jobs/003_StrongScaling/strongScalingJob.jl from commit 82b715b answering issue #3. This job reads in the cluster energy data from Muon g-2 era 2D and makes a plot of that energy for all clusters. The data is split evenly among all the MPI ranks. I've tried 2 nodes through 10 nodes. I always choose 32 tasks per node (advice from Marc Paterno). I ran the jobs in the debug queue.
On 10/27 I ran three more jobs with 12, 15 and 20 nodes (still 32 tasks per node) respectively. These jobs ran in the debug queue.
Note that on 10/25, I ran jobs with 12, 15 and 20 nodes in the regular queue. Note that I ran the 12 node job twice, the first time it ran I got a strange error, I think due to the CSCRATCH filesystem crashing (It's been a bad month for Cori). The second time I tried it, it ran fine. I ran these jobs in the regular queue, because the debug queue was very full. Because the elapsed time and memory usage looked strange, I replaced these runs with runs in the debug queue (see 10/27 above). See a comparison in Code of these runs in debug and regular queues.
All jobs ran on Haswell. Data came from CSCRATCH.
I recorded MPI timings from the Julia run. I also dumped SLURM accounting information for analysis.
Results
There are several types of results.
Histogram comparison
Each rank makes a histogram of cluster energy. All of these histograms are sent to the root rank with MPI.Gather and saved in the output file. Furthermore, I "reduce" the histogram by merging them into one with MPI.Reduce. See Histogram Comparison in the Code section where I compare these histograms to be sure that the reduced one is correct. Note that I'm using a static histogram defined in the IRMA.jl package that MPI can manipulate directory without the need for serialization/deserialization. See the comparison in Code (see above). The tests worked fine.
MPI Timing information
I record the time in the job with an IRMA Stopwatch. The stopwatch uses MPI.Wtime under the hood. The times are recorded as follows.
| Label | Meaning |
|---|---|
| start | After packages are loaded, functions are defined, and MPI.Init call |
| openedFile | After the h5open statement |
| openedDataSet | After the energy dataset is opened (but no data read yet) |
| determineRanges | After the ranges to examine are determined with partitionDS |
| readDataSet | After the dataset is read (this reads the actual data for the rank) |
| makeHistogram | After the data is histogrammed |
| gatheredHistograms | After all the histograms have been gathered to the root rank |
| reducedHistograms | After the histograms have been reduced to the root rank |
| gatherRankLogs | After the rank's log info has been gathered to the root rank |
Timing of anything before start is not recorded.
Let's look at the timing plots. Note that you can see all the timing plots in the Code section. Some representative plots will be reproduced here.
Here is the run using two nodes (and 32 ranks per node).
xxxxxxxxxxplot(plotsForRun(gdf[1])..., size=(1000,700), layout=(5,2))First, you see that each rank read 263,062,959 rows, or one less. Just opening the file took over 5 seconds, with the root rank taking a little bit longer. Reading the dataset itself takes the majority of the time. The spread of times over ranks is rather interesting. Some ranks took a bit longer to make the histograms. Some ranks were signficantly slower in the MPI.Gather, though the structure perhaps shows some clever consolidating in MPI. Reducing the histograms involves every other rank taking about a second longer than the others. Gathering the rank logs is very fast. You see all of this structure in the total MPI time.
Let's look at ten nodes
xxxxxxxxxxplot(plotsForRun(gdf[9])..., size=(1000,700), layout=(5,2))There's quite a bit of structure in these plots. Note that for readDataSet, where the data is actually read in, some nodes seem appear to be faster than others. Given that the reading is fast, but its time still dominates, the structure of the reducedHistograms becomes clear in the total time.
Here are twenty nodes
xxxxxxxxxxplot(plotsForRun(gdf[12])..., size=(1000,700), layout=(5,2))Here is a box plot of the total time vs. number of nodes...
xxxxxxxxxxstrongScalingPlotSince the job is as fast as the slowest rank, let's determine the maxium total time. We can then also determine the predicted cost of the job.
xxxxxxxxxx costMPIPlotInformation from the batch system
The above, however, is not the whole story, it seems. I can also get timing information from the batch system with the sacct command. I've done this for the jobs run here. This plot,
xxxxxxxxxxtimingComparisonPlotcompares the MPI total time (green points) to the total time the batch system reported for the Julia srun step (orange points). The elapsed time for the entire batch job is shown in the blue points. For 6 nodes, the Julia time is longer than the total batch time - that seems nonsensical.
There is a very large discrepancy between the MPI total time and the total Julia and batch time. Furthermore, the total Julia/Batch time for 12, 15 and 20 nodes is markedly higher. That may be due to running in the regular queue, though I can't think of a good reason for this.
The significant difference between the batch times and the MPI time must be Julia startup, package loading and initializing MPI. I don't have specific timings for those steps. Further investigation is required.
Let's look at the costs for these jobs as computed from the sacct data. They match the cost data in IRIS.
xxxxxxxxxxbatchJobCostPlotLet's look at other information from the accounting data.
Here is the maximum RSS reported by a task in MB.
xxxxxxxxxxmaxRSSPlotHere is the maximum VM size. Note that the scale is GB. This looks almost reasonsble, except for the big dip for 15 nodes.
xxxxxxxxxxmaxVmsizePlotWe can look at the maximum bytes read (note scale is MB)...
xxxxxxxxxxmaxDiskReadPlotand maximum bytes written. (not scale is KB)...
xxxxxxxxxxmaxDiskWriteConclusions
It is clear that there is a significant fraction of the Julia run that I am not including in my MPI timing. When I look at the scaling of timing reported by the batch system, the scaling is nonsensical, especially for the 12, 15 and 20 node runs. Could that be due to running them in the regular queue instead of the debug queue? How much time does it take to start Julia, load packages and initialize MPI? Could there be contention for disk when Julia is starting and packages are loaded?
Next steps
Try to add timing information for loading packages and initializing MPI. Perhaps try PackageCompiler.jl to speed up loading. Run everything in the debug queue.
Code
xxxxxxxxxx# Make the screen widehtml"""<style>main { max-width: 1100px;}"""xxxxxxxxxx# Activate the environmentbegin import Pkg Pkg.activate(".") # Activate the correct environment using Reviseendxxxxxxxxxx# Load initial packags to read the resultsusing IRMA, JLD2, FileIO, Glob, Pipe"/Users/lyon/Development/gm2/data/003_StrongScaling/"xxxxxxxxxxconst datapath = "/Users/lyon/Development/gm2/data/003_StrongScaling/""histos_10x32.jld2"
"histos_12x32.jld2"
"histos_15x32.jld2"
"histos_20x32.jld2"
"histos_2x32.jld2"
"histos_3x32.jld2"
"histos_4x32.jld2"
"histos_5x32.jld2"
"histos_6x32.jld2"
"histos_7x32.jld2"
"histos_8x32.jld2"
"histos_9x32.jld2"
xxxxxxxxxxhistoFiles = glob("*32.jld2", datapath) |> basename.(_)One file plots
Get our footting by looking at one results file.
"allTimings"
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
1.60338e9
"allHistos"
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8987, 67564, 271818, 671911, 983846, 1127727, 1143901, 1087948, 1019103 … 21, 8, 11, 14, 19, 9, 16, 9, 11, 6])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 9183, 67528, 273182, 674859, 985809, 1129689, 1145772, 1086231, 1020261 … 17, 12, 16, 16, 14, 14, 9, 16, 12, 15])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [417, 8994, 66754, 267877, 665840, 977760, 1124497, 1142303, 1089946, 1020811 … 14, 12, 13, 17, 10, 12, 13, 8, 12, 10])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8937, 65775, 265012, 660740, 972894, 1120848, 1138652, 1084756, 1018960 … 7, 12, 18, 12, 22, 19, 11, 11, 10, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9236, 66686, 267804, 662948, 974127, 1123196, 1139768, 1086556, 1019997 … 17, 12, 13, 11, 14, 18, 13, 6, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [418, 8876, 66028, 267306, 665525, 979543, 1124445, 1139276, 1086550, 1019662 … 14, 14, 16, 10, 14, 10, 12, 12, 11, 3])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 8680, 64799, 263105, 655852, 969226, 1119106, 1139437, 1084320, 1019810 … 14, 14, 19, 13, 14, 11, 15, 13, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [412, 9228, 68135, 274125, 675318, 985494, 1130227, 1144245, 1089163, 1021727 … 15, 14, 17, 18, 8, 9, 7, 13, 9, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [437, 8958, 66390, 268907, 667223, 979503, 1124677, 1140201, 1087651, 1019256 … 17, 13, 16, 18, 13, 5, 15, 11, 14, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [395, 9009, 66932, 270916, 670458, 980964, 1127087, 1142464, 1088396, 1020108 … 15, 21, 15, 13, 11, 13, 11, 12, 17, 14])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8830, 65964, 268068, 664834, 975495, 1122880, 1141053, 1086100, 1019054 … 22, 25, 14, 10, 18, 15, 10, 9, 7, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [422, 9049, 66293, 266579, 663726, 973925, 1120590, 1138615, 1085795, 1020930 … 20, 18, 21, 12, 10, 8, 9, 10, 10, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [391, 8870, 67954, 273028, 675640, 986455, 1130665, 1143607, 1089265, 1022054 … 13, 18, 8, 15, 12, 15, 11, 14, 9, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 8934, 65967, 266733, 663089, 974166, 1120549, 1138607, 1085924, 1022009 … 21, 16, 18, 14, 15, 14, 12, 11, 14, 12])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [428, 8988, 66502, 269404, 668487, 980000, 1123659, 1140791, 1091580, 1019476 … 26, 11, 8, 20, 10, 7, 11, 11, 5, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [430, 8939, 65405, 265496, 660106, 971303, 1119452, 1138270, 1086042, 1022566 … 12, 14, 19, 14, 10, 10, 12, 9, 10, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [377, 9201, 65904, 268338, 664649, 977059, 1124179, 1138840, 1086325, 1020575 … 20, 20, 18, 15, 9, 15, 12, 13, 11, 12])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 9200, 67227, 269945, 667383, 981255, 1123692, 1141210, 1088589, 1019823 … 23, 16, 18, 12, 12, 7, 13, 15, 13, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8934, 66448, 268719, 666856, 978173, 1123923, 1140158, 1088008, 1020730 … 22, 12, 14, 17, 9, 13, 10, 10, 12, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [406, 9160, 67018, 270253, 671523, 983886, 1126620, 1143449, 1089491, 1021045 … 16, 13, 13, 12, 18, 13, 14, 10, 10, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 8998, 67525, 272411, 675195, 986158, 1130119, 1145827, 1088111, 1020073 … 18, 19, 18, 17, 13, 20, 13, 7, 9, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 8985, 65549, 267449, 664307, 977594, 1124081, 1140220, 1086618, 1021566 … 18, 15, 20, 17, 15, 18, 11, 11, 7, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [400, 8936, 65580, 264305, 660333, 971313, 1119553, 1137664, 1085894, 1018637 … 13, 16, 15, 12, 10, 11, 16, 11, 10, 13])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 9168, 66979, 270335, 670909, 981304, 1126901, 1141814, 1088392, 1020991 … 22, 15, 14, 16, 9, 10, 11, 8, 9, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8598, 64748, 260829, 653188, 965067, 1114927, 1134836, 1086206, 1020185 … 22, 19, 12, 11, 17, 6, 10, 8, 16, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [381, 8520, 64998, 259176, 650528, 963140, 1115220, 1136505, 1084798, 1019899 … 11, 13, 20, 19, 9, 14, 8, 5, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9062, 65559, 266555, 663306, 975634, 1123014, 1139786, 1087470, 1021146 … 19, 18, 12, 21, 13, 16, 14, 6, 11, 6])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 9039, 66653, 267366, 666605, 979446, 1125430, 1139824, 1088706, 1020179 … 16, 13, 19, 9, 16, 16, 16, 7, 4, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8506, 63395, 255550, 644715, 960495, 1111255, 1134256, 1083981, 1020177 … 15, 10, 15, 18, 6, 13, 8, 9, 11, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 8573, 64798, 260387, 651845, 963980, 1114911, 1135131, 1083697, 1020231 … 15, 11, 22, 12, 12, 16, 9, 15, 7, 10])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9304, 66705, 270734, 668261, 980835, 1126531, 1144061, 1087314, 1021008 … 13, 18, 24, 8, 9, 7, 14, 7, 14, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [414, 8753, 64832, 260038, 653260, 966372, 1115372, 1137838, 1086086, 1019752 … 20, 18, 15, 15, 13, 16, 12, 9, 8, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [372, 9140, 65924, 265247, 663119, 974495, 1122680, 1138131, 1087676, 1019405 … 14, 13, 9, 14, 20, 9, 16, 14, 9, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [358, 9327, 66221, 268590, 666136, 978777, 1125792, 1142609, 1087770, 1021044 … 16, 11, 20, 22, 16, 14, 7, 7, 10, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [374, 8964, 66412, 265714, 661854, 974415, 1117545, 1137938, 1086687, 1019287 … 11, 15, 14, 9, 10, 8, 11, 10, 11, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [432, 9036, 66459, 267464, 666375, 979721, 1125594, 1140966, 1087531, 1021255 … 11, 17, 15, 12, 14, 13, 16, 7, 14, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8813, 66159, 267105, 666278, 979289, 1123041, 1138830, 1086769, 1019928 … 18, 20, 13, 16, 14, 10, 20, 7, 13, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [431, 9358, 68914, 278225, 684590, 994928, 1137105, 1149538, 1091562, 1023360 … 10, 16, 17, 10, 16, 10, 5, 12, 14, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [407, 8984, 66164, 267920, 664019, 976103, 1122398, 1141606, 1087653, 1021085 … 17, 19, 13, 22, 12, 12, 8, 15, 15, 13])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 9003, 64979, 264323, 658577, 971453, 1119718, 1139015, 1086191, 1020992 … 18, 10, 18, 14, 13, 19, 12, 10, 7, 6])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [376, 8861, 65764, 264977, 659667, 973754, 1121951, 1140205, 1086232, 1020885 … 12, 16, 11, 15, 18, 12, 12, 15, 4, 14])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9047, 65918, 265106, 659539, 970163, 1119980, 1138262, 1085181, 1017680 … 16, 22, 13, 11, 14, 9, 11, 11, 14, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [399, 8760, 64893, 262178, 656113, 967931, 1117909, 1137655, 1085977, 1020819 … 15, 18, 17, 17, 12, 15, 14, 19, 15, 15])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [404, 8863, 65468, 264853, 660066, 972893, 1119199, 1141358, 1086174, 1021062 … 9, 19, 10, 18, 10, 17, 8, 13, 7, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [390, 8989, 66512, 269705, 670094, 979608, 1125544, 1142496, 1086708, 1021557 … 18, 15, 13, 15, 11, 10, 6, 11, 13, 11])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [367, 8659, 64917, 261810, 655949, 968603, 1117311, 1137433, 1086806, 1020229 … 15, 21, 16, 14, 12, 10, 12, 11, 11, 12])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8967, 65778, 266824, 661051, 972570, 1119984, 1138376, 1086613, 1021071 … 9, 13, 12, 18, 17, 18, 10, 8, 7, 16])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [360, 8990, 65659, 265327, 661221, 975023, 1119846, 1138774, 1085657, 1019517 … 21, 12, 16, 26, 12, 15, 15, 11, 7, 12])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 9109, 67037, 268882, 668792, 979847, 1127408, 1140812, 1089240, 1018946 … 17, 15, 20, 11, 8, 8, 5, 10, 12, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [375, 9125, 67380, 271022, 672408, 981086, 1127943, 1142446, 1088122, 1019665 … 17, 14, 15, 18, 8, 9, 12, 9, 12, 6])
"oneHisto"
SHist: n=16836029329 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [113229, 2573686, 19013147, 76776828, 191017207, 280871996, 323201031, 328332785, 312998903, 293851706 … 4609, 4537, 4301, 4147, 3860, 3530, 3380, 3123, 2974, 2709])
"allRankLogs"
1
58458436
58458436
58458437
116916872
58458436
116916873
175375308
58458436
175375309
233833744
58458436
233833745
292292180
58458436
292292181
350750616
58458436
350750617
409209052
58458436
409209053
467667488
58458436
467667489
526125924
58458436
526125925
584584360
58458436
584584361
643042796
58458436
643042797
701501232
58458436
701501233
759959668
58458436
759959669
818418104
58458436
818418105
876876540
58458436
876876541
935334976
58458436
935334977
993793412
58458436
993793413
1052251848
58458436
1052251849
1110710284
58458436
1110710285
1169168720
58458436
1169168721
1227627156
58458436
1227627157
1286085592
58458436
1286085593
1344544028
58458436
1344544029
1403002464
58458436
1403002465
1461460900
58458436
1461460901
1519919336
58458436
1519919337
1578377772
58458436
1578377773
1636836208
58458436
1636836209
1695294644
58458436
1695294645
1753753080
58458436
1753753081
1812211516
58458436
1812211517
1870669952
58458436
1870669953
1929128388
58458436
1929128389
1987586824
58458436
1987586825
2046045260
58458436
2046045261
2104503696
58458436
2104503697
2162962132
58458436
2162962133
2221420568
58458436
2221420569
2279879004
58458436
2279879005
2338337440
58458436
16251444980
16309903414
58458435
16309903415
16368361849
58458435
16368361850
16426820284
58458435
16426820285
16485278719
58458435
16485278720
16543737154
58458435
16543737155
16602195589
58458435
16602195590
16660654024
58458435
16660654025
16719112459
58458435
16719112460
16777570894
58458435
16777570895
16836029329
58458435
xxxxxxxxxx# Load in one results filed = load(joinpath(datapath, "histos_9x32.jld2"))Base.KeySet for a Dict{String,Any} with 4 entries. Keys:
"allTimings"
"allHistos"
"oneHisto"
"allRankLogs"xxxxxxxxxx# What's in the file?keys(d)Let's look at the histogram information. We have a set of histograms and we have the reduced histogram. They should match.
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8987, 67564, 271818, 671911, 983846, 1127727, 1143901, 1087948, 1019103 … 21, 8, 11, 14, 19, 9, 16, 9, 11, 6])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 9183, 67528, 273182, 674859, 985809, 1129689, 1145772, 1086231, 1020261 … 17, 12, 16, 16, 14, 14, 9, 16, 12, 15])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [417, 8994, 66754, 267877, 665840, 977760, 1124497, 1142303, 1089946, 1020811 … 14, 12, 13, 17, 10, 12, 13, 8, 12, 10])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8937, 65775, 265012, 660740, 972894, 1120848, 1138652, 1084756, 1018960 … 7, 12, 18, 12, 22, 19, 11, 11, 10, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9236, 66686, 267804, 662948, 974127, 1123196, 1139768, 1086556, 1019997 … 17, 12, 13, 11, 14, 18, 13, 6, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [418, 8876, 66028, 267306, 665525, 979543, 1124445, 1139276, 1086550, 1019662 … 14, 14, 16, 10, 14, 10, 12, 12, 11, 3])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 8680, 64799, 263105, 655852, 969226, 1119106, 1139437, 1084320, 1019810 … 14, 14, 19, 13, 14, 11, 15, 13, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [412, 9228, 68135, 274125, 675318, 985494, 1130227, 1144245, 1089163, 1021727 … 15, 14, 17, 18, 8, 9, 7, 13, 9, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [437, 8958, 66390, 268907, 667223, 979503, 1124677, 1140201, 1087651, 1019256 … 17, 13, 16, 18, 13, 5, 15, 11, 14, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [395, 9009, 66932, 270916, 670458, 980964, 1127087, 1142464, 1088396, 1020108 … 15, 21, 15, 13, 11, 13, 11, 12, 17, 14])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8830, 65964, 268068, 664834, 975495, 1122880, 1141053, 1086100, 1019054 … 22, 25, 14, 10, 18, 15, 10, 9, 7, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [422, 9049, 66293, 266579, 663726, 973925, 1120590, 1138615, 1085795, 1020930 … 20, 18, 21, 12, 10, 8, 9, 10, 10, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [391, 8870, 67954, 273028, 675640, 986455, 1130665, 1143607, 1089265, 1022054 … 13, 18, 8, 15, 12, 15, 11, 14, 9, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 8934, 65967, 266733, 663089, 974166, 1120549, 1138607, 1085924, 1022009 … 21, 16, 18, 14, 15, 14, 12, 11, 14, 12])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [428, 8988, 66502, 269404, 668487, 980000, 1123659, 1140791, 1091580, 1019476 … 26, 11, 8, 20, 10, 7, 11, 11, 5, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [430, 8939, 65405, 265496, 660106, 971303, 1119452, 1138270, 1086042, 1022566 … 12, 14, 19, 14, 10, 10, 12, 9, 10, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [377, 9201, 65904, 268338, 664649, 977059, 1124179, 1138840, 1086325, 1020575 … 20, 20, 18, 15, 9, 15, 12, 13, 11, 12])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 9200, 67227, 269945, 667383, 981255, 1123692, 1141210, 1088589, 1019823 … 23, 16, 18, 12, 12, 7, 13, 15, 13, 8])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8934, 66448, 268719, 666856, 978173, 1123923, 1140158, 1088008, 1020730 … 22, 12, 14, 17, 9, 13, 10, 10, 12, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [406, 9160, 67018, 270253, 671523, 983886, 1126620, 1143449, 1089491, 1021045 … 16, 13, 13, 12, 18, 13, 14, 10, 10, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 8998, 67525, 272411, 675195, 986158, 1130119, 1145827, 1088111, 1020073 … 18, 19, 18, 17, 13, 20, 13, 7, 9, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 8985, 65549, 267449, 664307, 977594, 1124081, 1140220, 1086618, 1021566 … 18, 15, 20, 17, 15, 18, 11, 11, 7, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [400, 8936, 65580, 264305, 660333, 971313, 1119553, 1137664, 1085894, 1018637 … 13, 16, 15, 12, 10, 11, 16, 11, 10, 13])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 9168, 66979, 270335, 670909, 981304, 1126901, 1141814, 1088392, 1020991 … 22, 15, 14, 16, 9, 10, 11, 8, 9, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8598, 64748, 260829, 653188, 965067, 1114927, 1134836, 1086206, 1020185 … 22, 19, 12, 11, 17, 6, 10, 8, 16, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [381, 8520, 64998, 259176, 650528, 963140, 1115220, 1136505, 1084798, 1019899 … 11, 13, 20, 19, 9, 14, 8, 5, 7, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9062, 65559, 266555, 663306, 975634, 1123014, 1139786, 1087470, 1021146 … 19, 18, 12, 21, 13, 16, 14, 6, 11, 6])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 9039, 66653, 267366, 666605, 979446, 1125430, 1139824, 1088706, 1020179 … 16, 13, 19, 9, 16, 16, 16, 7, 4, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8506, 63395, 255550, 644715, 960495, 1111255, 1134256, 1083981, 1020177 … 15, 10, 15, 18, 6, 13, 8, 9, 11, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 8573, 64798, 260387, 651845, 963980, 1114911, 1135131, 1083697, 1020231 … 15, 11, 22, 12, 12, 16, 9, 15, 7, 10])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9304, 66705, 270734, 668261, 980835, 1126531, 1144061, 1087314, 1021008 … 13, 18, 24, 8, 9, 7, 14, 7, 14, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [414, 8753, 64832, 260038, 653260, 966372, 1115372, 1137838, 1086086, 1019752 … 20, 18, 15, 15, 13, 16, 12, 9, 8, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [372, 9140, 65924, 265247, 663119, 974495, 1122680, 1138131, 1087676, 1019405 … 14, 13, 9, 14, 20, 9, 16, 14, 9, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [358, 9327, 66221, 268590, 666136, 978777, 1125792, 1142609, 1087770, 1021044 … 16, 11, 20, 22, 16, 14, 7, 7, 10, 7])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [374, 8964, 66412, 265714, 661854, 974415, 1117545, 1137938, 1086687, 1019287 … 11, 15, 14, 9, 10, 8, 11, 10, 11, 5])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [432, 9036, 66459, 267464, 666375, 979721, 1125594, 1140966, 1087531, 1021255 … 11, 17, 15, 12, 14, 13, 16, 7, 14, 9])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8813, 66159, 267105, 666278, 979289, 1123041, 1138830, 1086769, 1019928 … 18, 20, 13, 16, 14, 10, 20, 7, 13, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [431, 9358, 68914, 278225, 684590, 994928, 1137105, 1149538, 1091562, 1023360 … 10, 16, 17, 10, 16, 10, 5, 12, 14, 11])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [407, 8984, 66164, 267920, 664019, 976103, 1122398, 1141606, 1087653, 1021085 … 17, 19, 13, 22, 12, 12, 8, 15, 15, 13])
SHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 9003, 64979, 264323, 658577, 971453, 1119718, 1139015, 1086191, 1020992 … 18, 10, 18, 14, 13, 19, 12, 10, 7, 6])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [376, 8861, 65764, 264977, 659667, 973754, 1121951, 1140205, 1086232, 1020885 … 12, 16, 11, 15, 18, 12, 12, 15, 4, 14])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9047, 65918, 265106, 659539, 970163, 1119980, 1138262, 1085181, 1017680 … 16, 22, 13, 11, 14, 9, 11, 11, 14, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [399, 8760, 64893, 262178, 656113, 967931, 1117909, 1137655, 1085977, 1020819 … 15, 18, 17, 17, 12, 15, 14, 19, 15, 15])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [404, 8863, 65468, 264853, 660066, 972893, 1119199, 1141358, 1086174, 1021062 … 9, 19, 10, 18, 10, 17, 8, 13, 7, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [390, 8989, 66512, 269705, 670094, 979608, 1125544, 1142496, 1086708, 1021557 … 18, 15, 13, 15, 11, 10, 6, 11, 13, 11])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [367, 8659, 64917, 261810, 655949, 968603, 1117311, 1137433, 1086806, 1020229 … 15, 21, 16, 14, 12, 10, 12, 11, 11, 12])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8967, 65778, 266824, 661051, 972570, 1119984, 1138376, 1086613, 1021071 … 9, 13, 12, 18, 17, 18, 10, 8, 7, 16])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [360, 8990, 65659, 265327, 661221, 975023, 1119846, 1138774, 1085657, 1019517 … 21, 12, 16, 26, 12, 15, 15, 11, 7, 12])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 9109, 67037, 268882, 668792, 979847, 1127408, 1140812, 1089240, 1018946 … 17, 15, 20, 11, 8, 8, 5, 10, 12, 9])
SHist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [375, 9125, 67380, 271022, 672408, 981086, 1127943, 1142446, 1088122, 1019665 … 17, 14, 15, 18, 8, 9, 12, 9, 12, 6])
xxxxxxxxxxallHistos = d["allHistos"]xxxxxxxxxxusing OnlineStats288xxxxxxxxxxlength(allHistos) # We should have one histogram per rankHist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8987, 67564, 271818, 671911, 983846, 1127727, 1143901, 1087948, 1019103 … 21, 8, 11, 14, 19, 9, 16, 9, 11, 6])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 9183, 67528, 273182, 674859, 985809, 1129689, 1145772, 1086231, 1020261 … 17, 12, 16, 16, 14, 14, 9, 16, 12, 15])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [417, 8994, 66754, 267877, 665840, 977760, 1124497, 1142303, 1089946, 1020811 … 14, 12, 13, 17, 10, 12, 13, 8, 12, 10])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8937, 65775, 265012, 660740, 972894, 1120848, 1138652, 1084756, 1018960 … 7, 12, 18, 12, 22, 19, 11, 11, 10, 9])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9236, 66686, 267804, 662948, 974127, 1123196, 1139768, 1086556, 1019997 … 17, 12, 13, 11, 14, 18, 13, 6, 7, 9])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [418, 8876, 66028, 267306, 665525, 979543, 1124445, 1139276, 1086550, 1019662 … 14, 14, 16, 10, 14, 10, 12, 12, 11, 3])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [401, 8680, 64799, 263105, 655852, 969226, 1119106, 1139437, 1084320, 1019810 … 14, 14, 19, 13, 14, 11, 15, 13, 7, 9])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [412, 9228, 68135, 274125, 675318, 985494, 1130227, 1144245, 1089163, 1021727 … 15, 14, 17, 18, 8, 9, 7, 13, 9, 7])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [437, 8958, 66390, 268907, 667223, 979503, 1124677, 1140201, 1087651, 1019256 … 17, 13, 16, 18, 13, 5, 15, 11, 14, 5])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [395, 9009, 66932, 270916, 670458, 980964, 1127087, 1142464, 1088396, 1020108 … 15, 21, 15, 13, 11, 13, 11, 12, 17, 14])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [396, 8830, 65964, 268068, 664834, 975495, 1122880, 1141053, 1086100, 1019054 … 22, 25, 14, 10, 18, 15, 10, 9, 7, 8])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [422, 9049, 66293, 266579, 663726, 973925, 1120590, 1138615, 1085795, 1020930 … 20, 18, 21, 12, 10, 8, 9, 10, 10, 5])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [391, 8870, 67954, 273028, 675640, 986455, 1130665, 1143607, 1089265, 1022054 … 13, 18, 8, 15, 12, 15, 11, 14, 9, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 8934, 65967, 266733, 663089, 974166, 1120549, 1138607, 1085924, 1022009 … 21, 16, 18, 14, 15, 14, 12, 11, 14, 12])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [428, 8988, 66502, 269404, 668487, 980000, 1123659, 1140791, 1091580, 1019476 … 26, 11, 8, 20, 10, 7, 11, 11, 5, 7])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [430, 8939, 65405, 265496, 660106, 971303, 1119452, 1138270, 1086042, 1022566 … 12, 14, 19, 14, 10, 10, 12, 9, 10, 8])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [377, 9201, 65904, 268338, 664649, 977059, 1124179, 1138840, 1086325, 1020575 … 20, 20, 18, 15, 9, 15, 12, 13, 11, 12])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 9200, 67227, 269945, 667383, 981255, 1123692, 1141210, 1088589, 1019823 … 23, 16, 18, 12, 12, 7, 13, 15, 13, 8])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8934, 66448, 268719, 666856, 978173, 1123923, 1140158, 1088008, 1020730 … 22, 12, 14, 17, 9, 13, 10, 10, 12, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [406, 9160, 67018, 270253, 671523, 983886, 1126620, 1143449, 1089491, 1021045 … 16, 13, 13, 12, 18, 13, 14, 10, 10, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 8998, 67525, 272411, 675195, 986158, 1130119, 1145827, 1088111, 1020073 … 18, 19, 18, 17, 13, 20, 13, 7, 9, 7])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 8985, 65549, 267449, 664307, 977594, 1124081, 1140220, 1086618, 1021566 … 18, 15, 20, 17, 15, 18, 11, 11, 7, 7])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [400, 8936, 65580, 264305, 660333, 971313, 1119553, 1137664, 1085894, 1018637 … 13, 16, 15, 12, 10, 11, 16, 11, 10, 13])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [398, 9168, 66979, 270335, 670909, 981304, 1126901, 1141814, 1088392, 1020991 … 22, 15, 14, 16, 9, 10, 11, 8, 9, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [393, 8598, 64748, 260829, 653188, 965067, 1114927, 1134836, 1086206, 1020185 … 22, 19, 12, 11, 17, 6, 10, 8, 16, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [381, 8520, 64998, 259176, 650528, 963140, 1115220, 1136505, 1084798, 1019899 … 11, 13, 20, 19, 9, 14, 8, 5, 7, 9])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9062, 65559, 266555, 663306, 975634, 1123014, 1139786, 1087470, 1021146 … 19, 18, 12, 21, 13, 16, 14, 6, 11, 6])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 9039, 66653, 267366, 666605, 979446, 1125430, 1139824, 1088706, 1020179 … 16, 13, 19, 9, 16, 16, 16, 7, 4, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8506, 63395, 255550, 644715, 960495, 1111255, 1134256, 1083981, 1020177 … 15, 10, 15, 18, 6, 13, 8, 9, 11, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [366, 8573, 64798, 260387, 651845, 963980, 1114911, 1135131, 1083697, 1020231 … 15, 11, 22, 12, 12, 16, 9, 15, 7, 10])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [373, 9304, 66705, 270734, 668261, 980835, 1126531, 1144061, 1087314, 1021008 … 13, 18, 24, 8, 9, 7, 14, 7, 14, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [414, 8753, 64832, 260038, 653260, 966372, 1115372, 1137838, 1086086, 1019752 … 20, 18, 15, 15, 13, 16, 12, 9, 8, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [372, 9140, 65924, 265247, 663119, 974495, 1122680, 1138131, 1087676, 1019405 … 14, 13, 9, 14, 20, 9, 16, 14, 9, 5])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [358, 9327, 66221, 268590, 666136, 978777, 1125792, 1142609, 1087770, 1021044 … 16, 11, 20, 22, 16, 14, 7, 7, 10, 7])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [374, 8964, 66412, 265714, 661854, 974415, 1117545, 1137938, 1086687, 1019287 … 11, 15, 14, 9, 10, 8, 11, 10, 11, 5])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [432, 9036, 66459, 267464, 666375, 979721, 1125594, 1140966, 1087531, 1021255 … 11, 17, 15, 12, 14, 13, 16, 7, 14, 9])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [397, 8813, 66159, 267105, 666278, 979289, 1123041, 1138830, 1086769, 1019928 … 18, 20, 13, 16, 14, 10, 20, 7, 13, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [431, 9358, 68914, 278225, 684590, 994928, 1137105, 1149538, 1091562, 1023360 … 10, 16, 17, 10, 16, 10, 5, 12, 14, 11])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [407, 8984, 66164, 267920, 664019, 976103, 1122398, 1141606, 1087653, 1021085 … 17, 19, 13, 22, 12, 12, 8, 15, 15, 13])
Hist: n=58458436 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [410, 9003, 64979, 264323, 658577, 971453, 1119718, 1139015, 1086191, 1020992 … 18, 10, 18, 14, 13, 19, 12, 10, 7, 6])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [376, 8861, 65764, 264977, 659667, 973754, 1121951, 1140205, 1086232, 1020885 … 12, 16, 11, 15, 18, 12, 12, 15, 4, 14])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [387, 9047, 65918, 265106, 659539, 970163, 1119980, 1138262, 1085181, 1017680 … 16, 22, 13, 11, 14, 9, 11, 11, 14, 9])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [399, 8760, 64893, 262178, 656113, 967931, 1117909, 1137655, 1085977, 1020819 … 15, 18, 17, 17, 12, 15, 14, 19, 15, 15])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [404, 8863, 65468, 264853, 660066, 972893, 1119199, 1141358, 1086174, 1021062 … 9, 19, 10, 18, 10, 17, 8, 13, 7, 9])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [390, 8989, 66512, 269705, 670094, 979608, 1125544, 1142496, 1086708, 1021557 … 18, 15, 13, 15, 11, 10, 6, 11, 13, 11])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [367, 8659, 64917, 261810, 655949, 968603, 1117311, 1137433, 1086806, 1020229 … 15, 21, 16, 14, 12, 10, 12, 11, 11, 12])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [369, 8967, 65778, 266824, 661051, 972570, 1119984, 1138376, 1086613, 1021071 … 9, 13, 12, 18, 17, 18, 10, 8, 7, 16])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [360, 8990, 65659, 265327, 661221, 975023, 1119846, 1138774, 1085657, 1019517 … 21, 12, 16, 26, 12, 15, 15, 11, 7, 12])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [415, 9109, 67037, 268882, 668792, 979847, 1127408, 1140812, 1089240, 1018946 … 17, 15, 20, 11, 8, 8, 5, 10, 12, 9])
Hist: n=58458435 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [375, 9125, 67380, 271022, 672408, 981086, 1127943, 1142446, 1088122, 1019665 … 17, 14, 15, 18, 8, 9, 12, 9, 12, 6])
xxxxxxxxxx# Convert our static histograms into Online histograms so we can merge themallHistsO = Hist.(allHistos)xxxxxxxxxxbegin using Plots, Measures, StatsPlots gr()endxxxxxxxxxxusing StatsBase58458435
239
58458436
49
xxxxxxxxxx# How many histograms have what entries?nobs.(allHistsO) |> countmapxxxxxxxxxxplot( plot(allHistsO[1]), plot(allHistsO[20]), legend=nothing)Hist: n=16836029329 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [113229, 2573686, 19013147, 76776828, 191017207, 280871996, 323201031, 328332785, 312998903, 293851706 … 4609, 4537, 4301, 4147, 3860, 3530, 3380, 3123, 2974, 2709])xxxxxxxxxx# Reduce all of the separate histograms into oneallHistsOS = reduce(merge, allHistsO)SHist: n=16836029329 | value=(x = 0.0f0:25.0f0:5000.0f0, y = [113229, 2573686, 19013147, 76776828, 191017207, 280871996, 323201031, 328332785, 312998903, 293851706 … 4609, 4537, 4301, 4147, 3860, 3530, 3380, 3123, 2974, 2709])xxxxxxxxxx# Compare with the one reduced by MPIoneHist = d["oneHisto"]xxxxxxxxxxusing Testxxxxxxxxxxusing PlutoUITest Summary: | Pass Total MPI Reduced Histogram is correct | 3 3
xxxxxxxxxx# Compare the histogram we just got by reduce to the one made by MPI.Reducewith_terminal() do "MPI Reduced Histogram is correct" begin nobs(oneHist) == nobs(allHistsOS) all( oneHist.counts .== allHistsOS.counts ) all( oneHist.out .== allHistsOS.out ) endendxxxxxxxxxxplot( Hist(oneHist), legend=nothing, linealpha=0.0 )200xxxxxxxxxxlength(oneHist.counts) # the number of binsLet's look at timing information
5.92782
5.66291
5.63991
5.64268
5.64645
5.64137
5.6335
5.63987
5.63449
5.63673
5.64351
5.64743
5.63177
5.6464
5.63901
5.63506
5.63965
5.63568
5.63171
5.63565
5.63172
5.63999
5.63755
5.63748
5.63974
5.65068
5.64422
5.64071
5.63823
5.64086
5.64018
5.64205
5.64065
5.67108
5.6439
5.64056
5.64671
5.63848
5.64083
5.63786
5.6483
5.64788
5.63688
5.64172
5.64
5.64198
5.63917
5.64056
5.63776
5.63912
0.00491691
0.00500202
0.0232339
0.0203919
0.0165701
0.0217381
0.0301831
0.021523
0.029104
0.0269001
0.019835
0.0156999
0.031992
0.01683
0.0244
0.028569
0.023787
0.0279851
0.0320079
0.0278859
0.0320761
0.023489
0.0259321
0.0260499
0.0236771
0.01231
0.018594
0.0225251
0.0253329
0.0224471
0.023241
0.021277
0.0223079
0.00606918
0.0185139
0.0221469
0.0159512
0.0241449
0.022052
0.0250189
0.0149291
0.0149159
0.0267961
0.0212622
0.0237241
0.021425
0.0245101
0.0228028
0.026022
0.0244699
0.0219841
0.013309
0.0115111
0.0115011
0.0113518
0.011518
0.0111709
0.011378
0.0112181
0.0112271
0.0112429
0.011359
0.011209
0.011359
0.011194
0.011143
0.0112741
0.0114439
0.011142
0.011241
0.011133
0.0114541
0.011297
0.0114141
0.0116289
0.011385
0.0114009
0.011586
0.0116069
0.011276
0.011265
0.0115631
0.0114191
0.0140238
0.011616
0.0113931
0.0114219
0.011337
0.0115528
0.0112691
0.011379
0.0114031
0.011241
0.011507
0.011385
0.0115781
0.0113659
0.01138
0.011234
0.0114081
8.21026
9.15108
8.9502
9.07878
8.95576
9.01127
8.55911
8.35559
8.92829
9.26869
9.02879
8.59589
8.76553
8.68056
9.31742
9.01002
8.75239
8.6059
8.69045
8.74811
9.31813
9.08053
9.02743
8.86149
8.67215
9.43617
8.74511
9.22431
8.61598
8.66833
8.58827
8.6675
8.832
8.59284
8.85905
8.69007
9.30472
8.58519
8.54919
9.16673
10.4704
10.1953
10.8428
9.7682
10.4068
10.2835
10.0901
9.95516
9.84683
10.6921
1.21164
1.17889
1.16153
1.23323
1.18205
1.23848
1.17013
1.25379
1.1629
1.15655
1.2282
1.20682
1.16272
1.18298
1.15675
1.15698
1.16061
1.1654
1.22958
1.16656
1.15256
1.16179
1.16112
1.16908
1.16853
1.16359
1.16826
1.16919
1.16897
1.16739
1.16726
1.16865
1.18955
1.1837
1.16602
1.1668
1.22548
1.1666
1.21465
1.22619
1.22979
1.16671
1.16164
1.17186
1.15943
1.16168
1.16739
1.16609
1.17188
1.15997
2.88739
0.527632
0.696449
0.496161
0.609555
0.497184
0.489438
0.497761
0.860443
0.486433
0.484547
0.49131
1.02326
0.48828
0.477695
0.487913
1.17102
0.488626
0.486736
0.487777
0.480868
0.494967
0.495053
0.494805
1.24277
0.484099
0.970356
0.489773
0.547239
0.496775
0.575374
0.494321
1.09464
0.527692
0.499243
0.498468
0.491644
0.494138
1.10705
0.478354
0.489198
0.493956
0.491665
0.497096
0.493216
0.496777
1.08012
0.494823
1.31904
0.48568
1.30475
0.145046
1.27849
0.131741
1.28629
0.132224
1.24509
0.134737
1.26243
0.130755
1.26881
0.131275
1.24882
0.133822
1.2488
0.12989
1.2825
0.131413
1.24073
0.132493
1.24876
0.131308
1.27807
0.133191
1.28249
0.129891
1.27816
0.131416
1.26216
0.134056
1.26252
0.134013
1.28866
0.148253
1.27663
0.133862
1.28405
0.134785
1.27503
0.129079
1.26847
0.133346
1.27034
0.133606
1.27606
0.133985
1.27887
0.139164
1.27372
0.129448
0.088012
0.0872271
0.086741
0.081121
0.0837278
0.0819268
0.084013
0.0859511
0.0826018
0.078886
0.0812132
0.0833662
0.0818748
0.0832551
0.0816631
0.0799179
0.0815909
0.0823619
0.0847239
0.0832579
0.0824819
0.0812051
0.081934
0.080837
0.0816481
0.0791521
0.0853028
0.0808959
0.086534
0.083622
0.0863512
0.0843279
0.0862272
0.0900869
0.086561
0.0833631
0.084393
0.0834181
0.086149
0.0778749
0.084682
0.082422
0.083091
0.085916
0.0859039
0.0826139
0.0850279
0.083585
0.0845201
0.078938
xxxxxxxxxx# Look at the timing informationrt = rankTimings(d["allTimings"])timingPlotsForRun (generic function with 1 method)xxxxxxxxxxfunction timingPlotsForRun(timings) p = [] for (k, v) in pairs(timings) push!(p, scatter(v, legend=nothing, title=k, xaxis="Rank", yaxis="Seconds", xticks=0:32:20*32, titlefontsize=11, xguidefontsize=8)) end pendxxxxxxxxxxh2x32 = plot(timingPlotsForRun(rt)..., size=(1000,800))19.6568
16.7711
17.8481
16.6956
17.7917
16.6357
17.2226
16.0006
17.9715
16.7962
17.7661
16.1831
17.9572
16.2435
17.9569
16.5395
18.1228
16.1488
17.4071
16.293
17.9577
16.6247
17.7184
16.4143
18.1226
16.9673
17.9214
16.7704
17.3561
16.2248
17.3545
16.2237
18.1655
16.2337
17.5615
16.2467
18.0644
16.1381
17.9065
16.7524
19.2171
17.746
19.5244
17.3312
19.0966
17.8336
19.3766
17.5136
19.371
18.2211
xxxxxxxxxx# Get the sumrtSum = rankTotalTime(d["allTimings"])xxxxxxxxxxscatter(rtSum, legend=nothing, yaxis="total time (seconds)", xaxis="rank", xticks=0:32:20*32)1
58458436
58458436
58458437
116916872
58458436
116916873
175375308
58458436
175375309
233833744
58458436
233833745
292292180
58458436
292292181
350750616
58458436
350750617
409209052
58458436
409209053
467667488
58458436
467667489
526125924
58458436
526125925
584584360
58458436
584584361
643042796
58458436
643042797
701501232
58458436
701501233
759959668
58458436
759959669
818418104
58458436
818418105
876876540
58458436
876876541
935334976
58458436
935334977
993793412
58458436
993793413
1052251848
58458436
1052251849
1110710284
58458436
1110710285
1169168720
58458436
1169168721
1227627156
58458436
1227627157
1286085592
58458436
1286085593
1344544028
58458436
1344544029
1403002464
58458436
1403002465
1461460900
58458436
1461460901
1519919336
58458436
1519919337
1578377772
58458436
1578377773
1636836208
58458436
1636836209
1695294644
58458436
1695294645
1753753080
58458436
1753753081
1812211516
58458436
1812211517
1870669952
58458436
1870669953
1929128388
58458436
1929128389
1987586824
58458436
1987586825
2046045260
58458436
2046045261
2104503696
58458436
2104503697
2162962132
58458436
2162962133
2221420568
58458436
2221420569
2279879004
58458436
2279879005
2338337440
58458436
16251444980
16309903414
58458435
16309903415
16368361849
58458435
16368361850
16426820284
58458435
16426820285
16485278719
58458435
16485278720
16543737154
58458435
16543737155
16602195589
58458435
16602195590
16660654024
58458435
16660654025
16719112459
58458435
16719112460
16777570894
58458435
16777570895
16836029329
58458435
xxxxxxxxxx# Get the log informationrankLogs = d["allRankLogs"]xxxxxxxxxxscatter( [x.len for x in rankLogs], legend=nothing, xaxis="rank", yaxis="Number of rows", xticks=0:32:20*32)xxxxxxxxxx# For what it's worth, can we get a single value out of the total time (mean and sd)?using Statistics18.0935
1.0118
xxxxxxxxxx( mean(rtSum), std(rtSum) )DataFrame all the things
Let's look at all of the output files and put the data into a DataFrame
xxxxxxxxxxusing DataFrames"histos_10x32.jld2"
"histos_12x32.jld2"
"histos_15x32.jld2"
"histos_20x32.jld2"
"histos_2x32.jld2"
"histos_3x32.jld2"
"histos_4x32.jld2"
"histos_5x32.jld2"
"histos_6x32.jld2"
"histos_7x32.jld2"
"histos_8x32.jld2"
"histos_9x32.jld2"
xxxxxxxxxxhistoFilesextractNNodesFromFileName (generic function with 1 method)xxxxxxxxxx# Extract number of nodes from histogram file namefunction extractNNodesFromFileName(fileName::String) m = match(r"histos_(\d+)x", fileName) m.captures[1] |> parse(Int, _)end10
12
15
20
2
3
4
5
6
7
8
9
xxxxxxxxxxextractNNodesFromFileName.(histoFiles)dataFrameFromRankData (generic function with 1 method)xxxxxxxxxx# Read in histos_nx32.jld2 file and return a dataframefunction dataFrameFromRankData(fileName::String) numNodes = extractNNodesFromFileName(fileName) data = load(joinpath(datapath, fileName)) # Load the JLD2 file rt = rankTimings(data["allTimings"]) # Extract the rank timings rl = data["allRankLogs"] # Get the log info # Number of rows processed numRanks = length(rl) # How many ranks? # Construct the DataFrame by columns df = DataFrame(numNodes=fill(numNodes, numRanks), rank=0:numRanks-1, numRows=[r.len for r in rl]) df = hcat(df, DataFrame(rt)) # Convert the named tuple of timings to DataFrame df = hcat(df, DataFrame(totalTime=rankTotalTime(data["allTimings"]))) return dfendTry processing one file...
xxxxxxxxxxdfOne = dataFrameFromRankData(histoFiles[2]);xxxxxxxxxxusing PlutoDataTable # Nice prototype DataFrame viewer from https://github.com/mthelm85/PlutoDataTable.jl # They need to work on the number of significant figures| numNodes | rank | numRows | openedFile | openedDataSet | determineRanges | readDataSet | madeHistogram | gatheredHistograms | reducedHistograms | gatheredRankLogs | totalTime |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | 0 | 43843827 | 6.137938976287842 | 0.005266904830932617 | 0.02328014373779297 | 8.194602012634277 | 0.9821538925170898 | 0.5564579963684082 | 1.4516401290893555 | 0.08662891387939453 | 17.437968969345093 |
| 12 | 1 | 43843827 | 5.862958908081055 | 0.005109071731567383 | 0.01510000228881836 | 7.6595799922943115 | 0.9531040191650391 | 0.5354259014129639 | 0.15390491485595703 | 0.09433913230895996 | 15.279521942138672 |
| 12 | 2 | 43843827 | 5.852871894836426 | 0.012571096420288086 | 0.01164102554321289 | 8.392387866973877 | 0.9695661067962646 | 0.5315930843353271 | 1.417914867401123 | 0.08737897872924805 | 17.275924921035767 |
| 12 | 3 | 43843827 | 5.851037979125977 | 0.014657974243164062 | 0.011658906936645508 | 8.06164002418518 | 0.9365639686584473 | 0.5249769687652588 | 0.13849902153015137 | 0.08597397804260254 | 15.625008821487427 |
| 12 | 4 | 43843827 | 5.8412041664123535 | 0.024796009063720703 | 0.011391878128051758 | 8.040452003479004 | 0.9541668891906738 | 0.6300950050354004 | 1.4128751754760742 | 0.08937501907348633 | 17.004356145858765 |
| 12 | 5 | 43843827 | 5.845725059509277 | 0.02020096778869629 | 0.011543035507202148 | 8.161539793014526 | 0.9370419979095459 | 0.526123046875 | 0.13825297355651855 | 0.08462119102478027 | 15.725048065185547 |
| 12 | 6 | 43843827 | 5.837119102478027 | 0.029242992401123047 | 0.011173009872436523 | 7.804313898086548 | 0.9286930561065674 | 0.5002119541168213 | 1.3672730922698975 | 0.0899958610534668 | 16.568022966384888 |
| 12 | 7 | 43843827 | 5.8376500606536865 | 0.02910590171813965 | 0.011319160461425781 | 7.613639831542969 | 0.934002161026001 | 0.4859929084777832 | 0.13613605499267578 | 0.08755993843078613 | 15.135406017303467 |
| 12 | 8 | 43843827 | 5.8386759757995605 | 0.027688026428222656 | 0.011397838592529297 | 8.007841110229492 | 0.9337968826293945 | 1.069261074066162 | 1.3858120441436768 | 0.0849609375 | 17.359433889389038 |
| 12 | 9 | 43843827 | 5.839199066162109 | 0.027292966842651367 | 0.011367082595825195 | 8.296742916107178 | 0.961453914642334 | 0.5117120742797852 | 0.13390088081359863 | 0.08134603500366211 | 15.863014936447144 |
xxxxxxxxxxdata_table(dfOne)Now process all of the files...
xxxxxxxxxx# Make a dataframe from all of the files (note the splat operator)begin df = vcat( dataFrameFromRankData.(histoFiles)... ); sort!(df)end;| numNodes | rank | numRows | openedFile | openedDataSet | determineRanges | readDataSet | madeHistogram | gatheredHistograms | reducedHistograms | gatheredRankLogs | totalTime |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 0 | 263062959 | 5.6598060131073 | 0.0055429935455322266 | 0.0214691162109375 | 34.76853895187378 | 4.499200105667114 | 2.8141138553619385 | 1.242159128189087 | 0.08100080490112305 | 49.09183096885681 |
| 2 | 1 | 263062959 | 5.403605937957764 | 0.014529943466186523 | 0.012408018112182617 | 34.34525394439697 | 4.442309141159058 | 0.5253279209136963 | 0.15195012092590332 | 0.08957195281982422 | 44.98495697975159 |
| 2 | 2 | 263062959 | 5.383642911911011 | 0.036483049392700195 | 0.011342048645019531 | 34.369189977645874 | 4.426295042037964 | 0.4979870319366455 | 1.2614738941192627 | 0.08274412155151367 | 46.06915807723999 |
| 2 | 3 | 263062959 | 5.3874831199646 | 0.03250408172607422 | 0.011675834655761719 | 34.25730109214783 | 4.438740015029907 | 0.5010528564453125 | 0.13320422172546387 | 0.08342194557189941 | 44.845383167266846 |
| 2 | 4 | 263062959 | 5.3945770263671875 | 0.02544999122619629 | 0.011377096176147461 | 35.315735816955566 | 4.415354013442993 | 0.8452939987182617 | 1.2535021305084229 | 0.08552384376525879 | 47.346813917160034 |
| 2 | 5 | 263062959 | 5.395463943481445 | 0.024688005447387695 | 0.011518001556396484 | 35.33270788192749 | 4.590600967407227 | 0.4957242012023926 | 0.13243699073791504 | 0.08124995231628418 | 46.06438994407654 |
| 2 | 6 | 263062959 | 5.384156942367554 | 0.036296844482421875 | 0.011256217956542969 | 35.39437484741211 | 4.535653114318848 | 0.6462059020996094 | 1.2381141185760498 | 0.0841059684753418 | 47.33016395568848 |
| 2 | 7 | 263062959 | 5.3957200050354 | 0.024638891220092773 | 0.011584043502807617 | 35.65788292884827 | 4.43148398399353 | 0.48669004440307617 | 0.1309831142425537 | 0.08180594444274902 | 46.22078895568848 |
| 2 | 8 | 263062959 | 5.3800458908081055 | 0.04062008857727051 | 0.011317014694213867 | 34.66770696640015 | 4.505692005157471 | 2.264936923980713 | 1.249094009399414 | 0.08036398887634277 | 48.19977688789368 |
| 2 | 9 | 263062959 | 5.380669832229614 | 0.04000997543334961 | 0.011162042617797852 | 35.44915294647217 | 4.607716083526611 | 0.480816125869751 | 0.12989592552185059 | 0.07915496826171875 | 46.17857789993286 |
xxxxxxxxxxdata_table(df)xxxxxxxxxx# Group the dataframes by number of nodesgdf = groupby(df, :numNodes);2
3
4
5
6
7
8
9
10
12
15
20
xxxxxxxxxxtheNumNodes = [k[1] for k in keys(gdf)]plotsForRun (generic function with 1 method)xxxxxxxxxx# Make plots for a groupfunction plotsForRun(df) cols = 3:ncol(df) # Don't plot numNodes and rank columns p = [] for i in cols yaxis = i==3 ? "# rows read" : "seconds" push!(p, scatter(df.rank, df[i], legend=nothing, title=names(df)[i], xaxis="Rank", yaxis=yaxis, xticks=0:32:20*32, titlefontsize=11, xguidefontsize=8, markersize=2)) end pendWith the slider below, you can choose which run to view. Note that this is a little glitchy - if not all of the plots appear, then adjust the slider and go back.
xxxxxxxxxx e Slider(1:length(gdf))Plots for run with 15 nodes (32 ranks per node)
xxxxxxxxxxplot(plotsForRun(gdf[e])..., size=(1000,700), layout=(5,2))Plot the scaling...
xxxxxxxxxxstrongScalingPlot = df boxplot(:numNodes, :totalTime, legend=nothing, title="Strong Scaling Study (one plot)", xaxis="Number of nodes", yaxis="Total time(s)", size=(800,600))Determine the maximium total time for each run, because the job is only as fast as the slowest rank.
12 rows × 2 columns
| numNodes | totalTime_maximum | |
|---|---|---|
| Int64 | Float64 | |
| 1 | 2 | 49.0918 |
| 2 | 3 | 36.3385 |
| 3 | 4 | 28.9671 |
| 4 | 5 | 27.3876 |
| 5 | 6 | 25.4913 |
| 6 | 7 | 23.771 |
| 7 | 8 | 20.118 |
| 8 | 9 | 19.6573 |
| 9 | 10 | 18.4349 |
| 10 | 12 | 17.438 |
| 11 | 15 | 15.7963 |
| 12 | 20 | 14.5096 |
xxxxxxxxxxtotalMPITimes = combine(gdf, :totalTime => maximum)xxxxxxxxxxmaxMPITimesPlot = totalMPITimes scatter(:numNodes, :totalTime_maximum, legend=nothing, xaxis="Number of nodes", yaxis="Maximum total time")xxxxxxxxxx# Add the cost (nubmer of nodes * hours * 140) - Here are four different ways to do this# Remember how this works...# transform(df, old_columns => function => new_columns)# Function accepts selected columns as arguments (note broadcasting)dfa = DataFrames.transform(df, [:numNodes, :totalTime] => ( (n,t) -> (140/60/60)n .* t) => :cost);# Function accepts elements from selected columns as arguments and implicitly loops over rows (note no broadcasting)#dfa = DataFrames.transform(df, [:numNodes, :totalTime] => ByRow( (x,y) -> (140/60/60)x * y ) => :cost);# Function accepts a Named Tuple containing selected columns (note broadcasting)#dfa = DataFrames.transform(df, AsTable([:numNodes, :totalTime]) => (t -> (140/60/60)t.numNodes .* t.totalTime) => :cost);# Function accepts a named tuple containing elements from selected columns (note no broadcasting)#dfa = DataFrames.transform(df, AsTable([:numNodes, :totalTime]) => ByRow(t -> (140/60/60)t.numNodes * t.totalTime) => :cost);| numNodes | totalTime | cost |
|---|---|---|
| 2 | 49.09183096885681 | 3.818253519799974 |
| 2 | 44.98495697975159 | 3.4988299873140125 |
| 2 | 46.06915807723999 | 3.5831567393408883 |
| 2 | 44.845383167266846 | 3.487974246342977 |
| 2 | 47.346813917160034 | 3.6825299713346693 |
| 2 | 46.06438994407654 | 3.582785884539286 |
| 2 | 47.33016395568848 | 3.681234974331326 |
| 2 | 46.22078895568848 | 3.5949502521091037 |
| 2 | 48.19977688789368 | 3.7488715357250637 |
| 2 | 46.17857789993286 | 3.591667169994778 |
xxxxxxxxxxdata_table(select(dfa, [1,12,13]))xxxxxxxxxxstrongScalingCostPlot = dfa boxplot(:numNodes, :cost, legend=nothing, title="Strong Scaling Study (one plot)", xaxis="Number of nodes", yaxis="Cost (NERSC Units)", size=(800,600))Do this for the maximum times
12 rows × 3 columns
| numNodes | totalTime_maximum | cost | |
|---|---|---|---|
| Int64 | Float64 | Float64 | |
| 1 | 2 | 49.0918 | 3.81825 |
| 2 | 3 | 36.3385 | 4.23949 |
| 3 | 4 | 28.9671 | 4.50599 |
| 4 | 5 | 27.3876 | 5.32537 |
| 5 | 6 | 25.4913 | 5.94797 |
| 6 | 7 | 23.771 | 6.471 |
| 7 | 8 | 20.118 | 6.25893 |
| 8 | 9 | 19.6573 | 6.88006 |
| 9 | 10 | 18.4349 | 7.16913 |
| 10 | 12 | 17.438 | 8.13772 |
| 11 | 15 | 15.7963 | 9.21449 |
| 12 | 20 | 14.5096 | 11.2852 |
xxxxxxxxxxtransform!(totalMPITimes, AsTable(:) => (t -> (140.0/60/60)t.numNodes .* t.totalTime_maximum) => :cost)xxxxxxxxxxcostMPIPlot = totalMPITimes scatter(:numNodes, [:totalTime_maximum :cost], label=["Total time (s)" "Cost (NERSC Units)"], xaxis="Number of nodes", yaxis="seconds or NERSC units")Examine accounting information
Examining the raw timing information from within MPI is not the whole story. Let's look at the Cori accounting information. I can do that by running ~/bin/sacct_csv in my NERSC directory. I've copied the output here.
xxxxxxxxxxusing CSV125 rows × 25 columns (omitted printing of 18 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | |
|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | |
| 1 | 35415096 | run_strongScalingJob.sh | debug_hsw | FAILED | 15:0 | 1 | 64 |
| 2 | 35415096.batch | batch | missing | FAILED | 15:0 | 1 | 64 |
| 3 | 35415096.extern | extern | missing | COMPLETED | 0:0 | 1 | 64 |
| 4 | 35415096.0 | julia | missing | FAILED | 15:0 | 1 | 32 |
| 5 | 35415110 | run_strongScalingJob.sh | debug_hsw | FAILED | 1:0 | 1 | 64 |
| 6 | 35415110.batch | batch | missing | FAILED | 1:0 | 1 | 64 |
| 7 | 35415110.extern | extern | missing | COMPLETED | 0:0 | 1 | 64 |
| 8 | 35415110.0 | julia | missing | FAILED | 1:0 | 1 | 32 |
| 9 | 35415167 | run_strongScalingJob.sh | debug_hsw | FAILED | 1:0 | 1 | 64 |
| 10 | 35415167.batch | batch | missing | FAILED | 1:0 | 1 | 64 |
| 11 | 35415167.extern | extern | missing | COMPLETED | 0:0 | 1 | 64 |
| 12 | 35415167.0 | julia | missing | FAILED | 1:0 | 1 | 32 |
| 13 | 35415240 | run_strongScalingJob.sh | debug_hsw | OUT_OF_MEMORY | 0:125 | 1 | 64 |
| 14 | 35415240.batch | batch | missing | OUT_OF_MEMORY | 0:125 | 1 | 64 |
| 15 | 35415240.extern | extern | missing | COMPLETED | 0:0 | 1 | 64 |
| 16 | 35415240.0 | julia | missing | OUT_OF_MEMORY | 0:125 | 1 | 32 |
| 17 | 35415285 | run_strongScalingJob.sh | debug_hsw | FAILED | 15:0 | 2 | 128 |
| 18 | 35415285.batch | batch | missing | FAILED | 15:0 | 1 | 64 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
xxxxxxxxxxsacct = CSV.File(joinpath(datapath, "sacct.csv")) |> DataFrame# Note that there are extraneous jobs in this fileWe can pull the job IDs from the log files...
"slurm-35415305_2x32.out"
"slurm-35415351_3x32.out"
"slurm-35415425_4x32.out"
"slurm-35415476_5x32.out"
"slurm-35415515_6x32.out"
"slurm-35415541_7x32.out"
"slurm-35415576_8x32.out"
"slurm-35415699_9x32.out"
"slurm-35415777_10x32.out"
"slurm-35584495_12x32.out"
"slurm-35584758_15x32.out"
"slurm-35587032_20x32.out"
x
slurmLogFiles = glob("slurm-3*.out", datapath) |> basename.(_)jobIdFromSlurmLogName (generic function with 1 method)xxxxxxxxxxfunction jobIdFromSlurmLogName(fn) m = match(r"slurm-(reg-)?(\d+)[_]", fn) m.captures[2]end"35415305"
"35415351"
"35415425"
"35415476"
"35415515"
"35415541"
"35415576"
"35415699"
"35415777"
"35584495"
"35584758"
"35587032"
xxxxxxxxxxslurmIds = jobIdFromSlurmLogName.(slurmLogFiles)selectDesiredJobIds (generic function with 1 method)x
# Select out the jobIds that we care aboutfunction selectDesiredJobIds(jobIds) sacctM = filter(:JobID => j -> occursin.(jobIds, j) |> any, sacct) # And don't care about the batch or extern jobs (not sure what they are) filter!(:JobName => jn -> jn != "batch" && jn != "extern", sacctM) sacctMend24 rows × 25 columns (omitted printing of 17 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | |
|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | |
| 1 | 35415305 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 2 | 128 | missing |
| 2 | 35415305.0 | julia | missing | COMPLETED | 0:0 | 2 | 64 | Block |
| 3 | 35415351 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 3 | 192 | missing |
| 4 | 35415351.0 | julia | missing | COMPLETED | 0:0 | 3 | 96 | Block |
| 5 | 35415425 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 4 | 256 | missing |
| 6 | 35415425.0 | julia | missing | COMPLETED | 0:0 | 4 | 128 | Block |
| 7 | 35415476 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 5 | 320 | missing |
| 8 | 35415476.0 | julia | missing | COMPLETED | 0:0 | 5 | 160 | Block |
| 9 | 35415515 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 6 | 384 | missing |
| 10 | 35415515.0 | julia | missing | COMPLETED | 0:0 | 6 | 192 | Block |
| 11 | 35415541 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 7 | 448 | missing |
| 12 | 35415541.0 | julia | missing | COMPLETED | 0:0 | 7 | 224 | Block |
| 13 | 35415576 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 8 | 512 | missing |
| 14 | 35415576.0 | julia | missing | COMPLETED | 0:0 | 8 | 256 | Block |
| 15 | 35415699 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 9 | 576 | missing |
| 16 | 35415699.0 | julia | missing | COMPLETED | 0:0 | 9 | 288 | Block |
| 17 | 35415777 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 10 | 640 | missing |
| 18 | 35415777.0 | julia | missing | COMPLETED | 0:0 | 10 | 320 | Block |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
xxxxxxxxxxsacctM = selectDesiredJobIds(slurmIds)| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | Submit | Start | End | MaxRSS | MaxRSSNode | MaxRSSTask | MaxVMSize | MaxVMSizeNode | MaxVMSizeTask | MaxDiskRead | MaxDiskReadNode | MaxDiskReadTask | MaxDiskWrite | MaxDiskWriteNode | MaxDiskWriteTask |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 35415305 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 2 | 128 | 10880 | 85 | 2020-10-22T09:21:59.0 | 2020-10-22T09:21:59.0 | 2020-10-22T09:23:24.0 | |||||||||||||
| 35415305.0 | julia | COMPLETED | 0:0 | 2 | 64 | Block | 5120 | 80 | 2020-10-22T09:22:04.0 | 2020-10-22T09:22:04.0 | 2020-10-22T09:23:24.0 | 2222612K | nid00881 | 41 | 3048020K | nid00881 | 41 | 921.12M | nid00880 | 25 | 0.38M | nid00880 | 0 | |
| 35415351 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 3 | 192 | 13056 | 68 | 2020-10-22T09:24:16.0 | 2020-10-22T09:24:19.0 | 2020-10-22T09:25:27.0 | |||||||||||||
| 35415351.0 | julia | COMPLETED | 0:0 | 3 | 96 | Block | 6048 | 63 | 2020-10-22T09:24:24.0 | 2020-10-22T09:24:24.0 | 2020-10-22T09:25:27.0 | 249871K | nid01149 | 0 | 1803104K | nid01151 | 64 | 19.90M | nid01150 | 32 | 0.37M | nid01149 | 0 | |
| 35415425 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 4 | 256 | 16128 | 63 | 2020-10-22T09:26:01.0 | 2020-10-22T09:26:04.0 | 2020-10-22T09:27:07.0 | |||||||||||||
| 35415425.0 | julia | COMPLETED | 0:0 | 4 | 128 | Block | 8064 | 63 | 2020-10-22T09:26:06.0 | 2020-10-22T09:26:06.0 | 2020-10-22T09:27:09.0 | 251817K | nid01243 | 0 | 1633996K | nid01608 | 96 | 19.90M | nid01245 | 32 | 0.37M | nid01243 | 0 | |
| 35415476 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 5 | 320 | 18880 | 59 | 2020-10-22T09:28:44.0 | 2020-10-22T09:28:47.0 | 2020-10-22T09:29:46.0 | |||||||||||||
| 35415476.0 | julia | COMPLETED | 0:0 | 5 | 160 | Block | 8800 | 55 | 2020-10-22T09:28:51.0 | 2020-10-22T09:28:51.0 | 2020-10-22T09:29:46.0 | 251853K | nid00749 | 0 | 1531348K | nid00753 | 128 | 19.90M | nid00753 | 128 | 0.37M | nid00749 | 0 | |
| 35415515 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 6 | 384 | 22656 | 59 | 2020-10-22T09:30:22.0 | 2020-10-22T09:30:24.0 | 2020-10-22T09:31:23.0 | |||||||||||||
| 35415515.0 | julia | COMPLETED | 0:0 | 6 | 192 | Block | 11712 | 61 | 2020-10-22T09:30:25.0 | 2020-10-22T09:30:25.0 | 2020-10-22T09:31:26.0 | 254048K | nid02060 | 0 | 1464996K | nid02062 | 64 | 19.90M | nid02062 | 64 | 0.37M | nid02060 | 0 | |
| 35415541 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 7 | 448 | 24640 | 55 | 2020-10-22T09:31:46.0 | 2020-10-22T09:31:49.0 | 2020-10-22T09:32:44.0 | |||||||||||||
| 35415541.0 | julia | COMPLETED | 0:0 | 7 | 224 | Block | 11872 | 53 | 2020-10-22T09:31:51.0 | 2020-10-22T09:31:51.0 | 2020-10-22T09:32:44.0 | 253960K | nid01866 | 0 | 1416164K | nid01873 | 64 | 19.90M | nid01873 | 64 | 0.37M | nid01866 | 0 | |
| 35415576 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 8 | 512 | 28160 | 55 | 2020-10-22T09:33:36.0 | 2020-10-22T09:33:41.0 | 2020-10-22T09:34:36.0 | |||||||||||||
| 35415576.0 | julia | COMPLETED | 0:0 | 8 | 256 | Block | 13568 | 53 | 2020-10-22T09:33:43.0 | 2020-10-22T09:33:43.0 | 2020-10-22T09:34:36.0 | 256221K | nid00880 | 0 | 1381628K | nid01139 | 96 | 19.90M | nid01139 | 96 | 0.37M | nid00880 | 0 | |
| 35415699 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 9 | 576 | 30528 | 53 | 2020-10-22T09:39:03.0 | 2020-10-22T09:39:03.0 | 2020-10-22T09:39:56.0 | |||||||||||||
| 35415699.0 | julia | COMPLETED | 0:0 | 9 | 288 | Block | 13824 | 48 | 2020-10-22T09:39:09.0 | 2020-10-22T09:39:09.0 | 2020-10-22T09:39:57.0 | 256088K | nid12971 | 0 | 1353192K | nid12978 | 224 | 19.90M | nid12973 | 64 | 0.37M | nid12971 | 0 | |
| 35415777 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 10 | 640 | 35840 | 56 | 2020-10-22T09:42:32.0 | 2020-10-22T09:42:40.0 | 2020-10-22T09:43:36.0 | |||||||||||||
| 35415777.0 | julia | COMPLETED | 0:0 | 10 | 320 | Block | 16000 | 50 | 2020-10-22T09:42:46.0 | 2020-10-22T09:42:46.0 | 2020-10-22T09:43:36.0 | 261248K | nid01187 | 0 | 1333328K | nid01187 | 0 | 30.67M | nid01403 | 254 | 0.37M | nid01187 | 0 | |
| 35584495 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 12 | 768 | 60672 | 79 | 2020-10-27T09:12:00.0 | 2020-10-27T09:12:42.0 | 2020-10-27T09:14:01.0 | |||||||||||||
| 35584495.0 | julia | COMPLETED | 0:0 | 12 | 384 | Block | 28800 | 75 | 2020-10-27T09:12:46.0 | 2020-10-27T09:12:46.0 | 2020-10-27T09:14:01.0 | 476295K | nid02065 | 0 | 1314924K | nid12916 | 128 | 171.78M | nid12943 | 306 | 0.43M | nid02065 | 0 | |
| 35584758 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 15 | 960 | 72000 | 75 | 2020-10-27T09:14:46.0 | 2020-10-27T09:25:12.0 | 2020-10-27T09:26:27.0 | |||||||||||||
| 35584758.0 | julia | COMPLETED | 0:0 | 15 | 480 | Block | 32640 | 68 | 2020-10-27T09:25:16.0 | 2020-10-27T09:25:16.0 | 2020-10-27T09:26:24.0 | 428873K | nid00822 | 0 | 1306712K | nid00822 | 0 | 141.42M | nid01190 | 128 | 0.54M | nid00822 | 0 | |
| 35587032 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 20 | 1280 | 101120 | 79 | 2020-10-27T09:42:52.0 | 2020-10-27T09:53:16.0 | 2020-10-27T09:54:35.0 | |||||||||||||
| 35587032.0 | julia | COMPLETED | 0:0 | 20 | 640 | Block | 49280 | 77 | 2020-10-27T09:53:20.0 | 2020-10-27T09:53:20.0 | 2020-10-27T09:54:37.0 | 306812K | nid02081 | 544 | 1222636K | nid00782 | 128 | 20.24M | nid00781 | 126 | 0.37M | nid00778 | 0 |
xxxxxxxxxxdata_table(sacctM; items_per_page=40)splitIntoBatchAndJulia (generic function with 1 method)x
# Split this up into Julia info and batch infofunction splitIntoBatchAndJulia(sacctM) batchInfo = filter(:JobName => jn -> jn != "julia", sacctM) juliaInfo = filter(:JobName => jn -> jn == "julia", sacctM) return batchInfo, juliaInfoend12 rows × 25 columns (omitted printing of 17 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | |
|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | |
| 1 | 35415305 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 2 | 128 | missing |
| 2 | 35415351 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 3 | 192 | missing |
| 3 | 35415425 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 4 | 256 | missing |
| 4 | 35415476 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 5 | 320 | missing |
| 5 | 35415515 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 6 | 384 | missing |
| 6 | 35415541 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 7 | 448 | missing |
| 7 | 35415576 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 8 | 512 | missing |
| 8 | 35415699 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 9 | 576 | missing |
| 9 | 35415777 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 10 | 640 | missing |
| 10 | 35584495 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 12 | 768 | missing |
| 11 | 35584758 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 15 | 960 | missing |
| 12 | 35587032 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 20 | 1280 | missing |
12 rows × 25 columns (omitted printing of 15 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | |
|---|---|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | Int64 | Int64 | |
| 1 | 35415305.0 | julia | missing | COMPLETED | 0:0 | 2 | 64 | Block | 5120 | 80 |
| 2 | 35415351.0 | julia | missing | COMPLETED | 0:0 | 3 | 96 | Block | 6048 | 63 |
| 3 | 35415425.0 | julia | missing | COMPLETED | 0:0 | 4 | 128 | Block | 8064 | 63 |
| 4 | 35415476.0 | julia | missing | COMPLETED | 0:0 | 5 | 160 | Block | 8800 | 55 |
| 5 | 35415515.0 | julia | missing | COMPLETED | 0:0 | 6 | 192 | Block | 11712 | 61 |
| 6 | 35415541.0 | julia | missing | COMPLETED | 0:0 | 7 | 224 | Block | 11872 | 53 |
| 7 | 35415576.0 | julia | missing | COMPLETED | 0:0 | 8 | 256 | Block | 13568 | 53 |
| 8 | 35415699.0 | julia | missing | COMPLETED | 0:0 | 9 | 288 | Block | 13824 | 48 |
| 9 | 35415777.0 | julia | missing | COMPLETED | 0:0 | 10 | 320 | Block | 16000 | 50 |
| 10 | 35584495.0 | julia | missing | COMPLETED | 0:0 | 12 | 384 | Block | 28800 | 75 |
| 11 | 35584758.0 | julia | missing | COMPLETED | 0:0 | 15 | 480 | Block | 32640 | 68 |
| 12 | 35587032.0 | julia | missing | COMPLETED | 0:0 | 20 | 640 | Block | 49280 | 77 |
xxxxxxxxxxbatchInfo, juliaInfo = splitIntoBatchAndJulia(sacctM)| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | Submit | Start | End | MaxRSS | MaxRSSNode | MaxRSSTask | MaxVMSize | MaxVMSizeNode | MaxVMSizeTask | MaxDiskRead | MaxDiskReadNode | MaxDiskReadTask | MaxDiskWrite | MaxDiskWriteNode | MaxDiskWriteTask |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 35415305 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 2 | 128 | 10880 | 85 | 2020-10-22T09:21:59.0 | 2020-10-22T09:21:59.0 | 2020-10-22T09:23:24.0 | |||||||||||||
| 35415351 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 3 | 192 | 13056 | 68 | 2020-10-22T09:24:16.0 | 2020-10-22T09:24:19.0 | 2020-10-22T09:25:27.0 | |||||||||||||
| 35415425 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 4 | 256 | 16128 | 63 | 2020-10-22T09:26:01.0 | 2020-10-22T09:26:04.0 | 2020-10-22T09:27:07.0 | |||||||||||||
| 35415476 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 5 | 320 | 18880 | 59 | 2020-10-22T09:28:44.0 | 2020-10-22T09:28:47.0 | 2020-10-22T09:29:46.0 | |||||||||||||
| 35415515 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 6 | 384 | 22656 | 59 | 2020-10-22T09:30:22.0 | 2020-10-22T09:30:24.0 | 2020-10-22T09:31:23.0 | |||||||||||||
| 35415541 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 7 | 448 | 24640 | 55 | 2020-10-22T09:31:46.0 | 2020-10-22T09:31:49.0 | 2020-10-22T09:32:44.0 | |||||||||||||
| 35415576 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 8 | 512 | 28160 | 55 | 2020-10-22T09:33:36.0 | 2020-10-22T09:33:41.0 | 2020-10-22T09:34:36.0 | |||||||||||||
| 35415699 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 9 | 576 | 30528 | 53 | 2020-10-22T09:39:03.0 | 2020-10-22T09:39:03.0 | 2020-10-22T09:39:56.0 | |||||||||||||
| 35415777 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 10 | 640 | 35840 | 56 | 2020-10-22T09:42:32.0 | 2020-10-22T09:42:40.0 | 2020-10-22T09:43:36.0 | |||||||||||||
| 35584495 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 12 | 768 | 60672 | 79 | 2020-10-27T09:12:00.0 | 2020-10-27T09:12:42.0 | 2020-10-27T09:14:01.0 |
x
data_table(batchInfo)12 rows × 26 columns (omitted printing of 16 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | |
|---|---|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | Int64 | Int64 | |
| 1 | 35415305.0 | julia | missing | COMPLETED | 0:0 | 2 | 64 | Block | 5120 | 80 |
| 2 | 35415351.0 | julia | missing | COMPLETED | 0:0 | 3 | 96 | Block | 6048 | 63 |
| 3 | 35415425.0 | julia | missing | COMPLETED | 0:0 | 4 | 128 | Block | 8064 | 63 |
| 4 | 35415476.0 | julia | missing | COMPLETED | 0:0 | 5 | 160 | Block | 8800 | 55 |
| 5 | 35415515.0 | julia | missing | COMPLETED | 0:0 | 6 | 192 | Block | 11712 | 61 |
| 6 | 35415541.0 | julia | missing | COMPLETED | 0:0 | 7 | 224 | Block | 11872 | 53 |
| 7 | 35415576.0 | julia | missing | COMPLETED | 0:0 | 8 | 256 | Block | 13568 | 53 |
| 8 | 35415699.0 | julia | missing | COMPLETED | 0:0 | 9 | 288 | Block | 13824 | 48 |
| 9 | 35415777.0 | julia | missing | COMPLETED | 0:0 | 10 | 320 | Block | 16000 | 50 |
| 10 | 35584495.0 | julia | missing | COMPLETED | 0:0 | 12 | 384 | Block | 28800 | 75 |
| 11 | 35584758.0 | julia | missing | COMPLETED | 0:0 | 15 | 480 | Block | 32640 | 68 |
| 12 | 35587032.0 | julia | missing | COMPLETED | 0:0 | 20 | 640 | Block | 49280 | 77 |
xxxxxxxxxx# Add CPU time per ranktransform!(juliaInfo, [:CPUTimeRAW, :NCPUS] => ((c,n) -> c ./ n) => :CPUTimePerRank)So the elapsed time is exactly the CPU seconds per task
12 rows × 2 columns
| ElapsedRaw | CPUTimePerRank | |
|---|---|---|
| Int64 | Float64 | |
| 1 | 80 | 80.0 |
| 2 | 63 | 63.0 |
| 3 | 63 | 63.0 |
| 4 | 55 | 55.0 |
| 5 | 61 | 61.0 |
| 6 | 53 | 53.0 |
| 7 | 53 | 53.0 |
| 8 | 48 | 48.0 |
| 9 | 50 | 50.0 |
| 10 | 75 | 75.0 |
| 11 | 68 | 68.0 |
| 12 | 77 | 77.0 |
xxxxxxxxxxselect(juliaInfo, [:ElapsedRaw, :CPUTimePerRank])# So the elapsed time is the same as the CPU time per rank?Here's the total elapsed time. This looks goofy.
xxxxxxxxxx# Let's make some plotstotalBatchTimePlot = batchInfo scatter(:NNodes, :ElapsedRaw, legend=nothing, title="Total batch time", xaxis="# of nodes", yaxis="Elaspsed time (s)")12 rows × 26 columns (omitted printing of 18 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | |
|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | |
| 1 | 35415305 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 2 | 128 | missing |
| 2 | 35415351 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 3 | 192 | missing |
| 3 | 35415425 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 4 | 256 | missing |
| 4 | 35415476 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 5 | 320 | missing |
| 5 | 35415515 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 6 | 384 | missing |
| 6 | 35415541 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 7 | 448 | missing |
| 7 | 35415576 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 8 | 512 | missing |
| 8 | 35415699 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 9 | 576 | missing |
| 9 | 35415777 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 10 | 640 | missing |
| 10 | 35584495 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 12 | 768 | missing |
| 11 | 35584758 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 15 | 960 | missing |
| 12 | 35587032 | run_strongScalingJob.sh | debug_hsw | COMPLETED | 0:0 | 20 | 1280 | missing |
xxxxxxxxxxtransform!(batchInfo, [:NNodes, :ElapsedRaw] => ( (n, e) -> (140.0/60/60)n .* e) => :cost)xxxxxxxxxxbatchJobCostPlot = batchInfo scatter(:NNodes, :cost, legend=nothing, title="Job cost", xaxis="# of nodes", yaxis="Cost (NERSC units)")Look at how long Julia itself took
xxxxxxxxxx# How long did Julia take?juliaElapsedTimePlot = juliaInfo scatter(:NNodes, :ElapsedRaw, legend=nothing, title="Julia Elapsed Time", xaxis="# of nodes", yaxis="Elapsed time (s)")# Very much follows the batch elapsed timeSo, Julia is taking significantly more time than what I recorded in MPI
xxxxxxxxxxbegin timingComparisonPlot = batchInfo scatter(:NNodes, :ElapsedRaw, label="Total Batch Time") juliaInfo scatter!(:NNodes, :ElapsedRaw, label="Total Julia time") totalMPITimes scatter!(:numNodes, :totalTime_maximum, label="MPI Julia time", xaxis="# of nodes", yaxis="Elapsed time (s)",legend=:right)end;# Something is really goofy here. Is this julia startup and loading libraries?xxxxxxxxxxtimingComparisonPlotNote that for 4 nodes, the julia time plots on top of the batch time point. So there's a lot going on that's not accounted for in my MPI timing.
"JobID"
"JobName"
"QOS"
"State"
"ExitCode"
"NNodes"
"NCPUS"
"Layout"
"CPUTimeRAW"
"ElapsedRaw"
"Submit"
"Start"
"End"
"MaxRSS"
"MaxRSSNode"
"MaxRSSTask"
"MaxVMSize"
"MaxVMSizeNode"
"MaxVMSizeTask"
"MaxDiskRead"
"MaxDiskReadNode"
"MaxDiskReadTask"
"MaxDiskWrite"
"MaxDiskWriteNode"
"MaxDiskWriteTask"
"CPUTimePerRank"
xxxxxxxxxx# Let's look at memory and stuffnames(juliaInfo)| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | Submit | Start | End | MaxRSS | MaxRSSNode | MaxRSSTask | MaxVMSize | MaxVMSizeNode | MaxVMSizeTask | MaxDiskRead | MaxDiskReadNode | MaxDiskReadTask | MaxDiskWrite | MaxDiskWriteNode | MaxDiskWriteTask | CPUTimePerRank |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 35415305.0 | julia | COMPLETED | 0:0 | 2 | 64 | Block | 5120 | 80 | 2020-10-22T09:22:04.0 | 2020-10-22T09:22:04.0 | 2020-10-22T09:23:24.0 | 2222612K | nid00881 | 41 | 3048020K | nid00881 | 41 | 921.12M | nid00880 | 25 | 0.38M | nid00880 | 0 | 80 | |
| 35415351.0 | julia | COMPLETED | 0:0 | 3 | 96 | Block | 6048 | 63 | 2020-10-22T09:24:24.0 | 2020-10-22T09:24:24.0 | 2020-10-22T09:25:27.0 | 249871K | nid01149 | 0 | 1803104K | nid01151 | 64 | 19.90M | nid01150 | 32 | 0.37M | nid01149 | 0 | 63 | |
| 35415425.0 | julia | COMPLETED | 0:0 | 4 | 128 | Block | 8064 | 63 | 2020-10-22T09:26:06.0 | 2020-10-22T09:26:06.0 | 2020-10-22T09:27:09.0 | 251817K | nid01243 | 0 | 1633996K | nid01608 | 96 | 19.90M | nid01245 | 32 | 0.37M | nid01243 | 0 | 63 | |
| 35415476.0 | julia | COMPLETED | 0:0 | 5 | 160 | Block | 8800 | 55 | 2020-10-22T09:28:51.0 | 2020-10-22T09:28:51.0 | 2020-10-22T09:29:46.0 | 251853K | nid00749 | 0 | 1531348K | nid00753 | 128 | 19.90M | nid00753 | 128 | 0.37M | nid00749 | 0 | 55 | |
| 35415515.0 | julia | COMPLETED | 0:0 | 6 | 192 | Block | 11712 | 61 | 2020-10-22T09:30:25.0 | 2020-10-22T09:30:25.0 | 2020-10-22T09:31:26.0 | 254048K | nid02060 | 0 | 1464996K | nid02062 | 64 | 19.90M | nid02062 | 64 | 0.37M | nid02060 | 0 | 61 | |
| 35415541.0 | julia | COMPLETED | 0:0 | 7 | 224 | Block | 11872 | 53 | 2020-10-22T09:31:51.0 | 2020-10-22T09:31:51.0 | 2020-10-22T09:32:44.0 | 253960K | nid01866 | 0 | 1416164K | nid01873 | 64 | 19.90M | nid01873 | 64 | 0.37M | nid01866 | 0 | 53 | |
| 35415576.0 | julia | COMPLETED | 0:0 | 8 | 256 | Block | 13568 | 53 | 2020-10-22T09:33:43.0 | 2020-10-22T09:33:43.0 | 2020-10-22T09:34:36.0 | 256221K | nid00880 | 0 | 1381628K | nid01139 | 96 | 19.90M | nid01139 | 96 | 0.37M | nid00880 | 0 | 53 | |
| 35415699.0 | julia | COMPLETED | 0:0 | 9 | 288 | Block | 13824 | 48 | 2020-10-22T09:39:09.0 | 2020-10-22T09:39:09.0 | 2020-10-22T09:39:57.0 | 256088K | nid12971 | 0 | 1353192K | nid12978 | 224 | 19.90M | nid12973 | 64 | 0.37M | nid12971 | 0 | 48 | |
| 35415777.0 | julia | COMPLETED | 0:0 | 10 | 320 | Block | 16000 | 50 | 2020-10-22T09:42:46.0 | 2020-10-22T09:42:46.0 | 2020-10-22T09:43:36.0 | 261248K | nid01187 | 0 | 1333328K | nid01187 | 0 | 30.67M | nid01403 | 254 | 0.37M | nid01187 | 0 | 50 | |
| 35584495.0 | julia | COMPLETED | 0:0 | 12 | 384 | Block | 28800 | 75 | 2020-10-27T09:12:46.0 | 2020-10-27T09:12:46.0 | 2020-10-27T09:14:01.0 | 476295K | nid02065 | 0 | 1314924K | nid12916 | 128 | 171.78M | nid12943 | 306 | 0.43M | nid02065 | 0 | 75 | |
| 35584758.0 | julia | COMPLETED | 0:0 | 15 | 480 | Block | 32640 | 68 | 2020-10-27T09:25:16.0 | 2020-10-27T09:25:16.0 | 2020-10-27T09:26:24.0 | 428873K | nid00822 | 0 | 1306712K | nid00822 | 0 | 141.42M | nid01190 | 128 | 0.54M | nid00822 | 0 | 68 | |
| 35587032.0 | julia | COMPLETED | 0:0 | 20 | 640 | Block | 49280 | 77 | 2020-10-27T09:53:20.0 | 2020-10-27T09:53:20.0 | 2020-10-27T09:54:37.0 | 306812K | nid02081 | 544 | 1222636K | nid00782 | 128 | 20.24M | nid00781 | 126 | 0.37M | nid00778 | 0 | 77 |
xxxxxxxxxxdata_table(juliaInfo, items_per_page=20 )Here's a function to read in things like "100K", "150.6M", "2.5G" and output a Float in Megabytes.
parseMemoryToMB (generic function with 1 method)xxxxxxxxxx# Parse a string with "K", "M", "G" and return megabytesfunction parseMemoryToMB(mem) m = match(r"(\d+\.?\d*)([KMG])", mem) v = m.captures[1] |> parse(Float64, _) s = m.captures[2] if s == "K" v /= 1024 elseif s == "G" v *= 1024 end vendWrite some tests...
Test Summary: | Pass Total Test parseMemoryToMB | 4 4
xxxxxxxxxxwith_terminal() do "Test parseMemoryToMB" begin parseMemoryToMB("100K") == 100/1024 parseMemoryToMB("100.9K") == 100.9/1024 parseMemoryToMB("100.9M") == 100.9 parseMemoryToMB("100.9G") == 100.9*1024 endendxxxxxxxxxxmaxVmsizePlot = juliaInfo scatter(:NNodes, parseMemoryToMB.(:MaxVMSize)./1024, legend=nothing, xaxis="# nodes", yaxis="Max VMSize (GB)")xxxxxxxxxxmaxRSSPlot = juliaInfo scatter(:NNodes, parseMemoryToMB.(:MaxRSS), legend=nothing, xaxis="# nodes", yaxis="Max RSS (MB)")xxxxxxxxxxmaxDiskReadPlot = juliaInfo scatter(:NNodes, parseMemoryToMB.(:MaxDiskRead), legend=nothing, xaxis="# nodes", yaxis="Max MB read off disk")xxxxxxxxxxmaxDiskWrite = juliaInfo scatter(:NNodes, parseMemoryToMB.(:MaxDiskWrite).*1024, legend=nothing, xaxis="# nodes", yaxis="Max KB written to disk")Compare debug queue to regular queue
I ran 12, 15 and 20 nodes in the regular queue since getting nodes from the debug queue was very slow. Let's compare accounting info from those queues.
"slurm-reg-35453280_12x32.out"
"slurm-reg-35506920_15x32.out"
"slurm-reg-35506934_20x32.out"
xxxxxxxxxxslurmRegLogFiles = glob("slurm-reg*.out", datapath) |> basename.(_)"35453280"
"35506920"
"35506934"
xxxxxxxxxxslurmRegIds = jobIdFromSlurmLogName.(slurmRegLogFiles)3 rows × 25 columns (omitted printing of 17 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | |
|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | |
| 1 | 35453280 | run_strongScalingJob.sh | regular_1 | COMPLETED | 0:0 | 12 | 768 | missing |
| 2 | 35506920 | run_strongScalingJob.sh | regular_1 | COMPLETED | 0:0 | 15 | 960 | missing |
| 3 | 35506934 | run_strongScalingJob.sh | regular_1 | COMPLETED | 0:0 | 20 | 1280 | missing |
3 rows × 25 columns (omitted printing of 15 columns)
| JobID | JobName | QOS | State | ExitCode | NNodes | NCPUS | Layout | CPUTimeRAW | ElapsedRaw | |
|---|---|---|---|---|---|---|---|---|---|---|
| String | String | String? | String | String | Int64 | Int64 | String? | Int64 | Int64 | |
| 1 | 35453280.0 | julia | missing | COMPLETED | 0:0 | 12 | 384 | Block | 29568 | 77 |
| 2 | 35506920.0 | julia | missing | COMPLETED | 0:0 | 15 | 480 | Block | 29760 | 62 |
| 3 | 35506934.0 | julia | missing | COMPLETED | 0:0 | 20 | 640 | Block | 44160 | 69 |
x
batchRegInfo, juliaRegInfo = selectDesiredJobIds(slurmRegIds) |> splitIntoBatchAndJulia"JobID"
"JobName"
"QOS"
"State"
"ExitCode"
"NNodes"
"NCPUS"
"Layout"
"CPUTimeRAW"
"ElapsedRaw"
"Submit"
"Start"
"End"
"MaxRSS"
"MaxRSSNode"
"MaxRSSTask"
"MaxVMSize"
"MaxVMSizeNode"
"MaxVMSizeTask"
"MaxDiskRead"
"MaxDiskReadNode"
"MaxDiskReadTask"
"MaxDiskWrite"
"MaxDiskWriteNode"
"MaxDiskWriteTask"
"cost"
xxxxxxxxxxnames(batchInfo)3 rows × 7 columns
| JobID | NNodes | ElapsedRaw | MaxRSS | MaxVMSize | MaxDiskRead | MaxDiskWrite | |
|---|---|---|---|---|---|---|---|
| String | Int64 | Int64 | String? | String? | String? | String? | |
| 1 | 35584495.0 | 12 | 75 | 476295K | 1314924K | 171.78M | 0.43M |
| 2 | 35584758.0 | 15 | 68 | 428873K | 1306712K | 141.42M | 0.54M |
| 3 | 35587032.0 | 20 | 77 | 306812K | 1222636K | 20.24M | 0.37M |
x
filter(:NNodes => n -> n in [12, 15, 20], juliaInfo) |> select(_, [:JobID, :NNodes, :ElapsedRaw, :MaxRSS, :MaxVMSize, :MaxDiskRead, :MaxDiskWrite])3 rows × 7 columns
| JobID | NNodes | ElapsedRaw | MaxRSS | MaxVMSize | MaxDiskRead | MaxDiskWrite | |
|---|---|---|---|---|---|---|---|
| String | Int64 | Int64 | String? | String? | String? | String? | |
| 1 | 35453280.0 | 12 | 77 | 245922K | 1300512K | 20.22M | 0.37M |
| 2 | 35506920.0 | 15 | 62 | 306820K | 791140K | 20.21M | 0.17M |
| 3 | 35506934.0 | 20 | 69 | 357769K | 1259084K | 111.83M | 0.62M |
x
select(juliaRegInfo, [:JobID, :NNodes, :ElapsedRaw, :MaxRSS, :MaxVMSize, :MaxDiskRead, :MaxDiskWrite])3 rows × 4 columns
| JobID | QOS | NNodes | ElapsedRaw | |
|---|---|---|---|---|
| String | String? | Int64 | Int64 | |
| 1 | 35584495 | debug_hsw | 12 | 79 |
| 2 | 35584758 | debug_hsw | 15 | 75 |
| 3 | 35587032 | debug_hsw | 20 | 79 |
x
filter(:NNodes => n -> n in [12, 15, 20], batchInfo) |> select(_, [:JobID, :QOS, :NNodes, :ElapsedRaw])3 rows × 4 columns
| JobID | QOS | NNodes | ElapsedRaw | |
|---|---|---|---|---|
| String | String? | Int64 | Int64 | |
| 1 | 35453280 | regular_1 | 12 | 81 |
| 2 | 35506920 | regular_1 | 15 | 70 |
| 3 | 35506934 | regular_1 | 20 | 80 |
xxxxxxxxxxselect(batchRegInfo, [:JobID, :QOS, :NNodes, :ElapsedRaw])